M7a: regex foundation — /pat/flags literals + regex value + $contains/$split#24
Merged
Conversation
…iguation
The tokenizer scans a regex when '/' appears in operand position (tracked via
the previous token) and division otherwise; depth-tracking finds the closing /.
Empty // -> S0301, unterminated -> S0302. Parser emits a {type=regex} node.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
src/jsonata/regex.lua wraps lrexlib-pcre2 (lazy require; i->CASELESS,
m->MULTILINE; byte->char offsets). A regex node evaluates to a first-class
_jsonata_function whose impl returns a {match,start,end,groups,next} object;
next() throws D1004 on a zero-width match. rockspec gains lrexlib-pcre2.
Also fixes a pre-existing eval_path bug: a parenthesized block as the first
path step (e.g. (...).field) was evaluated per-element over a NOTHING input
and skipped, so navigating into its result returned undefined. Blocks are now
treated as self-contained first steps (evaluated once over the whole input),
matching function-call/variable handling and JSONata semantics.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Both gain a regex branch driven by applying the regex value; string args keep their existing behaviour. Signatures: $contains <s-(sf):b>, $split <s-(sf)n?:a<s>>. Unblocks the (sf) signatures deferred in M5b. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ro regressions Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
M7a added lrexlib-pcre2 (lazy-loaded PCRE2 binding); CI must install the PCRE2 C library and the rock so the regex tests can require rex_pcre2. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Lands the regex foundation for JSONata:
/pat/flagsliterals, a first-class callable regex value, a lazy PCRE2-backed engine adapter, and the$contains/$splitregex builtins. Faithful to jsonata-js v2.2.1. ($match/$replacefollow in M7b.)3a877ae): the tokenizer scans/pat/flags(i/m flags) when/appears in operand position (decided from the previous token) and division otherwise; depth-tracking finds the closing/. Empty//→ S0301, unterminated → S0302. Parser emits a{type=regex}node.b9b1b0f): a newsrc/jsonata/regex.luawrapslrexlib-pcre2—required lazily (i→CASELESS, m→MULTILINE; byte→char offsets). A regex evaluates to a first-class_jsonata_functionwhoseimplreturns a{match,start,end,groups,next}object;next()throws D1004 on a zero-width match. ($type(/x/)→"function".)fc034ee):$contains/$splitaccept a string or a regex (driven by applying the regex value); string args keep their existing behaviour. Signatures<s-(sf):b>/<s-(sf)n?:a<s>>— also unblocks the(sf)signatures deferred in M5b.lrexlib-pcre2(lazy) — the zero-dependency drop-in property holds for everything except actual regex usage (verified: non-regex programs never load PCRE2).A small Task-2 side-fix (a parenthesized
(block)path-head is now self-contained) additionally turned +3 non-regex official cases green.Results
/-disambiguation, division, and existing string$contains/$splitunchanged./disambiguation matrix,$contains/$splitregex, regex-value match/start/end/flags/global-iterator, S0301/S0302, joins) is oracle-faithful.Deferred to M7b (documented; suite-invisible in M7a)
.groupsunwrap + cons-array navigation (entangled — the real fix is holistic cons-array spread/indexing, best done when$matchmakesgroupscentral).=comparison crash ($string = $numbercrashes identically).H.serializeskip the match-objectnextfield; hoistis_regex/applytoH;$splitO(n²) re-slicing.Test plan
busted spec/— 514/0busted spec/jsonata_suite_spec.lua— zero-regression guard greenbash scripts/run-suite.sh— 1261/1682rex_pcre2/disambiguation matrix🤖 Generated with Claude Code